-
Notifications
You must be signed in to change notification settings - Fork 3.1k
Fix hidden states and quant kv cache #10854
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix hidden states and quant kv cache #10854
Conversation
Li-Z-Q
commented
Jul 17, 2025
- 支持量化代码返回 hidden_states
- 支持针对向量模型进行量化加载,包括 weight_only_int8,weight_only_int4 两种方式
- 支持向量模型量化加载时仅预分配第一层 kv_cache 并在后续计算时进行复用,从而降低显存占用
Thanks for your contribution! |
@@ -1481,7 +1481,7 @@ def forward( | |||
self.pre_process(**kwargs) | |||
kwargs["cum_offsets"] = cum_offsets | |||
|
|||
if caches is not None: | |||
if caches is not None and not kwargs["kv_cache_reuse"]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个位置需要判断是否存在kv_cache_reuse,如果不存在给默认值
[PaddleNLP-CI]任务执行失败,手动验证发现pr代码执行grpo的case会报错。手动复现命令如下: 报错信息 |
Test任务网络问题已修复,辛苦merge下develop代码 |
已merge |
已通过修改kv_cache_reuse默认值进行修复 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
* fix hidden states * fix quant kv_cache * fix Lint style * fix kv_cache_reuse key error * fix kv_cache_reuse key error * remove unused code * fix kv_cache_reuse default